CHO: A Benchmark Suite for OpenCL-based FPGA Accelerators
نویسندگان
چکیده
Programming FPGAs with OpenCL-based high-level synthesis frameworks is gaining attention with a number of commercial and research frameworks announced. However, there are no benchmarks for evaluating these frameworks. To this end, we present CHO benchmark suite an extension of CHStone, a commonly used C-based high-level synthesis benchmark suite, for OpenCL. We characterise CHO at various levels and use it to investigate compiling non-trivial software to FPGAs. 1 I N T R O D U C T I O N Open Computing Language (OpenCL) [1] is an open standard for platform-independent, general purpose parallel programming across CPUs, GPUs and accelerators. OpenCL consists of an API for coordinating parallel computation and a cross platform programming language (subset of ISO C99 with extensions for parallelism). It allows software to be written once and executed on any device that supports OpenCL. Its execution model consists of a host device which submits computational intensive kernels to compute devices for execution. Programming Field-Programmable Gate Arrays (FPGAs) with OpenCL-based High-Level Synthesis (HLS) frameworks is now becoming mainstream with active support by the major FPGA vendors [2]. HLS is the automatic conversion of an algorithmic description into either a low-level Register Transfer Level (RTL) description or a digital circuit [3]. RTL refers to the low-level design abstraction that models a digital circuit in terms of the flow of digital signals between registers and the logical operations performed on those signals. HLS allows a designer to work more productively at a higher level of abstraction and achieve faster time-to-market than using error prone and difficult to debug RTL. Further, frameworks that use software programming languages such as C, OpenCL open up the power of FPGAs to software engineers (who outnumber hardware engineers by an order of magnitude). Benchmarking is an important technique for analysing the performance of systems by studying the execution of the benchmark applications that are chosen to be a representation of the applications of interest. A good HLS benchmark suite should allow HLS framework developers to qualitatively evaluate new ideas as well serve as a standard for benchmarking the diverse HLS frameworks available. From our discussion with HLS users (especially non-FPGA experts), who have to choose from the myriad of HLS frameworks, the second objective is equally important as the first. In this paper, we introduce CHO: a suite of benchmark applications for OpenCL-based HLS platforms that meets the objectives set out above. CHO is an rewrite of the C-based CHStone benchmark suite [4]. CHStone is the commonly used HLS benchmark suite and consists of 12 applications from diverse application domains. Although based largely on C, OpenCL differs in some aspects from the standard C language. For example, OpenCL has disjoint memory spaces and moving data from one memory space to another need to be done explicitly. Hence, moving from one language to the other is often not straightforward. This paper makes the following contributions: • We present CHO an OpenCL port of the CHStone HLS benchmark allowing for the benchmarking of OpenCL-based HLS. • We characterise CHO at various levels. • We use CHO and a state-of-the-art OpenCL HLS framework to evaluate compiling non-trivial programs to FPGA. 2 R E L AT E D W O R K Benchmarking of HLS frameworks does not have a rich history when compared to benchmarking of software platforms and compilers. Early HLS framework developers tend to use their own choice of applications for evaluation. In the 90’s, the HLS community attempted to standardize benchmarking by proposing the 1992 High-Level Synthesis Workshop Benchmarks [5] and the 1995 HighLevel Synthesis Design Repository [6]. These benchmarks covered a number different applications and application domains but were mostly written in algorithmic VHDL. VHDL is a type of Hardware Description Language
منابع مشابه
Description of the initial accelerator benchmark suite
The work produced within this task is an extension of the UEABS (Unified European Applications Benchmark Suite) for accelerators. As a first version of the extension, this document will present a full definition of a suite for accelerators. This will cover each code, presenting the code in itself as well as the test cases defined for the benchmarks and the problems that could occur during the n...
متن کاملEnergy-efficient FPGA Implementation of the k-Nearest Neighbors Algorithm Using OpenCL
Modern SoCs are getting increasingly heterogeneous with a combination of multi-core architectures and hardware accelerators to speed up the execution of computeintensive tasks at considerably lower power consumption. Modern FPGAs, due to their reasonable execution speed and comparatively lower power consumption, are strong competitors to the traditional GPU based accelerators. High-level Synthe...
متن کاملSPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance
Hybrid nodes with hardware accelerators are becoming very common in systems today. Users often find it difficult to characterize and understand the performance advantage of such accelerators for their applications. The SPEC High Performance Group (HPG) has developed a set of performance metrics to evaluate the performance and power consumption of accelerators for various science applications. T...
متن کاملBenchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption
Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available performance of heterogeneous architectures may be challenging. There are various parallel programming framewo...
متن کاملOpenCL 2.0 for FPGAs using OCLAcc
Designing hardware is a time-consuming and complex process. Realization of both, embedded and highperformance applications can benefit from a design process on a higher level of abstraction. This helps to reduce development time and allows to iteratively test and optimize the hardware design during development, as common in software development. We present our tool, OCLAcc, which allows the gen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015